Including prosodic cues in ASR systems
نویسندگان
چکیده
Several aspects related to production, as well as natural perception of speech, have gradually been incorporated to automatic speech recognition systems. Nevertheless, the set of speech prosodic characteristics has not been used for the time being in an explicit way in the recognition process itself. In this work, an analysis of the prosody’s three most important parameters: energy, fundamental frequency and duration, is presented with a method to incorporate this information into automatic speech recognition. Prosodic-accentual features are incorporated in a hidden Markov models recognizer. Their theoretical formulation and experimental setup are presented. Several experiments are developed to show the method behavior in a Spanish continuous speech database. From this understanding and with other database subsets, the overall results provide a word recognition error reduction that would reach more than 30% when prosodic-accentual cues are incorporated.
منابع مشابه
Production of English Lexical Stress by Persian EFL Learners
This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...
متن کاملCross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish
We present the results of a series of machine learning experiments aimed at exploring the differences and similarities in the production of turn-taking cues in American English and Argentine Spanish. An analysis of prosodic features automatically extracted from 21 dyadic conversations (12 En, 9 Sp) revealed that, when signaling Holds, speakers of both languages tend to use roughly the same comb...
متن کاملGeneralizing prosodic prediction of speech recognition errors
Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR errors to experiments with a new system and new domain, the W99 conference registration system. O...
متن کاملLanguage identification on code-switching utterances using multiple cues
Code-switching speech is an utterance containing two or more languages. Usually, the switching linguistic unit is in clause or word levels. In this paper, a two-stage framework is proposed, containing a language identifier and then a speech recognizer, to evaluate on a Mandarin-Taiwanese codeswitching utterance. In the language identifier, we use multiple cues including acoustic, prosodic and p...
متن کاملPredicting Automatic Speech Recognition Performance Using Prosodic Cues
In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001